首页> 外文OA文献 >Squish: Near-Optimal Compression for Archival of Relational Datasets
【2h】

Squish: Near-Optimal Compression for Archival of Relational Datasets

机译:squish:关系数据集存档的近似最优压缩

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Relational datasets are being generated at an alarmingly rapid rate acrossorganizations and industries. Compressing these datasets could significantlyreduce storage and archival costs. Traditional compression algorithms, e.g.,gzip, are suboptimal for compressing relational datasets since they ignore thetable structure and relationships between attributes. We study compression algorithms that leverage the relational structure tocompress datasets to a much greater extent. We develop Squish, a system thatuses a combination of Bayesian Networks and Arithmetic Coding to capturemultiple kinds of dependencies among attributes and achieve near-entropycompression rate. Squish also supports user-defined attributes: users caninstantiate new data types by simply implementing five functions for a newclass interface. We prove the asymptotic optimality of our compressionalgorithm and conduct experiments to show the effectiveness of our system:Squish achieves a reduction of over 50\% in storage size relative to systemsdeveloped in prior work on a variety of real datasets.
机译:关系数据集正在以惊人的速度在整个组织和行业中生成。压缩这些数据集可以大大降低存储和归档成本。传统的压缩算法(例如gzip)对于压缩关系数据集次优,因为它们忽略了表格结构和属性之间的关系。我们研究了利用关系结构在更大程度上压缩数据集的压缩算法。我们开发了Squish,该系统结合使用贝叶斯网络和算术编码来捕获属性之间的多种依存关系,并实现接近熵的压缩率。 Squish还支持用户定义的属性:用户可以通过简单地为newclass接口实现五个函数来实例化新的数据类型。我们证明了压缩算法的渐近最优性,并进行了实验以证明我们的系统的有效性:相对于先前在各种真实数据集上开发的系统,Squish的存储量减少了50%以上。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号